QMDP-Net: Deep Learning for Planning under Partial Observability

نویسندگان

Péter Karkus

David Hsu

Wee Sun Lee

چکیده

This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDPnet on different tasks so that it can generalize to new ones in the parameterized task set and “transfer” to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Plans for Safety and Reachability Goals with Partial Observability

Traditional planning assumes reachability goals and/or full observability. In this paper, we propose a novel solution for safety and reachability planning with partial observability. Given a planning domain, a safety property, and a reachability goal, we automatically learn a safe and permissive plan to guide the planning domain so that the safety property is not violated and which can force th...

متن کامل

Planning with Extended Goals and Partial Observability

Planning in nondeterministic domains with temporally extended goals under partial observability is one of the most challenging problems in planning. Simpler subsets of this problem have been already addressed in the literature, but the general combination of extended goals and partial observability is, to the best of our knowledge, still an open problem. In this paper we present a first attempt...

متن کامل

Product Representation of Belief Spaces in Planning under Partial Observability

We present a product representation of belief spaces for planning under partial observability. In earlier work we investigated backward plan construction based on a combination operation for belief states. The main problem in explicit construction of belief states is their high number. To remedy this problem, we refrain from representing individual belief states explicitly, and instead represen...

متن کامل

LTLf and LDLf Synthesis under Partial Observability

In this paper, we study synthesis under partial observability for logical specifications over finite traces expressed in LTLf /LDLf . This form of synthesis can be seen as a generalization of planning under partial observability in nondeterministic domains, which is known to be 2EXPTIMEcomplete. We start by showing that the usual “belief-state construction” used in planning under partial observ...

متن کامل

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrentlyexploring teammates. Approaches that learn specialized policies for individual tasks face problems when applied to the real world: not only do agents have to ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

QMDP-Net: Deep Learning for Planning under Partial Observability

نویسندگان

چکیده

منابع مشابه

Learning Plans for Safety and Reachability Goals with Partial Observability

Planning with Extended Goals and Partial Observability

Product Representation of Belief Spaces in Planning under Partial Observability

LTLf and LDLf Synthesis under Partial Observability

Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability

عنوان ژورنال:

اشتراک گذاری